Skip to content

Benchmark GPUArrays AK reduction implementation #2815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

christiangnrd
Copy link
Member

@christiangnrd christiangnrd commented Jul 22, 2025

Do not merge. Not a draft so the benchmarks run.

Copy link

codecov bot commented Jul 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.06%. Comparing base (e561e7a) to head (cb54c77).

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2815       +/-   ##
===========================================
- Coverage   89.62%   75.06%   -14.57%     
===========================================
  Files         153      153               
  Lines       13276    13213       -63     
===========================================
- Hits        11899     9918     -1981     
- Misses       1377     3295     +1918     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@christiangnrd christiangnrd marked this pull request as ready for review July 22, 2025 17:27
Copy link
Contributor

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/perf/runbenchmarks.jl b/perf/runbenchmarks.jl
index ad2564a7d..0b84d307d 100644
--- a/perf/runbenchmarks.jl
+++ b/perf/runbenchmarks.jl
@@ -1,6 +1,6 @@
 # benchmark suite execution and codespeed submission
 using Pkg
-Pkg.add(url="https://github.com/christiangnrd/GPUArrays.jl", rev="akreduce")
+Pkg.add(url = "https://github.com/christiangnrd/GPUArrays.jl", rev = "akreduce")
 
 using CUDA
 

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 8fd59db Previous: e561e7a Ratio
latency/precompile 43267306340.5 ns 43393378645 ns 1.00
latency/ttfp 6993590450 ns 7099882121 ns 0.99
latency/import 3573563594 ns 3463869374 ns 1.03
integration/volumerhs 9628360.5 ns 9623663 ns 1.00
integration/byval/slices=1 146972 ns 146714 ns 1.00
integration/byval/slices=3 425877 ns 425787 ns 1.00
integration/byval/reference 145006 ns 144967 ns 1.00
integration/byval/slices=2 286766 ns 286209 ns 1.00
integration/cudadevrt 103536 ns 103426 ns 1.00
kernel/indexing 14132 ns 14196 ns 1.00
kernel/indexing_checked 14886 ns 14906 ns 1.00
kernel/occupancy 693.9054054054054 ns 759.2189781021898 ns 0.91
kernel/launch 2168.8888888888887 ns 2287.222222222222 ns 0.95
kernel/rand 14649 ns 15792 ns 0.93
array/reverse/1d 19757.5 ns 19624 ns 1.01
array/reverse/2d 24702 ns 24928.5 ns 0.99
array/reverse/1d_inplace 10355 ns 10448 ns 0.99
array/reverse/2d_inplace 11869.5 ns 12006 ns 0.99
array/copy 20679 ns 20990 ns 0.99
array/iteration/findall/int 157631 ns 159128.5 ns 0.99
array/iteration/findall/bool 139481 ns 139832 ns 1.00
array/iteration/findfirst/int 136892 ns 162546 ns 0.84
array/iteration/findfirst/bool 130429 ns 164393.5 ns 0.79
array/iteration/scalar 72427 ns 72740 ns 1.00
array/iteration/logical 206893 ns 216803.5 ns 0.95
array/iteration/findmin/1d 117265.5 ns 45968 ns 2.55
array/iteration/findmin/2d 258985.5 ns 96433 ns 2.69
array/reductions/reduce/Int64/1d 46056.5 ns 44555 ns 1.03
array/reductions/reduce/Int64/dims=1 87317 ns 48607 ns 1.80
array/reductions/reduce/Int64/dims=2 47322 ns 63682.5 ns 0.74
array/reductions/reduce/Int64/dims=1L 167957.5 ns 88842 ns 1.89
array/reductions/reduce/Int64/dims=2L 1158683 ns 89417.5 ns 12.96
array/reductions/reduce/Float32/1d 43684 ns 34490 ns 1.27
array/reductions/reduce/Float32/dims=1 89077 ns 50554 ns 1.76
array/reductions/reduce/Float32/dims=2 43317 ns 59726 ns 0.73
array/reductions/reduce/Float32/dims=1L 116708.5 ns 52852 ns 2.21
array/reductions/reduce/Float32/dims=2L 1107589 ns 70052.5 ns 15.81
array/reductions/mapreduce/Int64/1d 46570 ns 45547 ns 1.02
array/reductions/mapreduce/Int64/dims=1 87306 ns 48423.5 ns 1.80
array/reductions/mapreduce/Int64/dims=2 47437 ns 61443 ns 0.77
array/reductions/mapreduce/Int64/dims=1L 167764 ns 88888 ns 1.89
array/reductions/mapreduce/Int64/dims=2L 1161470 ns 87908.5 ns 13.21
array/reductions/mapreduce/Float32/1d 44861 ns 34245.5 ns 1.31
array/reductions/mapreduce/Float32/dims=1 89153 ns 47287 ns 1.89
array/reductions/mapreduce/Float32/dims=2 43310 ns 59743 ns 0.72
array/reductions/mapreduce/Float32/dims=1L 117063 ns 53154 ns 2.20
array/reductions/mapreduce/Float32/dims=2L 1106921 ns 70503 ns 15.70
array/broadcast 20460 ns 20866 ns 0.98
array/copyto!/gpu_to_gpu 11132 ns 12817 ns 0.87
array/copyto!/cpu_to_gpu 215248 ns 213873 ns 1.01
array/copyto!/gpu_to_cpu 282942.5 ns 284406 ns 0.99
array/accumulate/Int64/1d 124532 ns 125170 ns 0.99
array/accumulate/Int64/dims=1 83259 ns 83519 ns 1.00
array/accumulate/Int64/dims=2 157436 ns 158002 ns 1.00
array/accumulate/Int64/dims=1L 1709001 ns 1709945.5 ns 1.00
array/accumulate/Int64/dims=2L 965915 ns 966571 ns 1.00
array/accumulate/Float32/1d 108726 ns 109737 ns 0.99
array/accumulate/Float32/dims=1 80262 ns 80823.5 ns 0.99
array/accumulate/Float32/dims=2 147251 ns 147778 ns 1.00
array/accumulate/Float32/dims=1L 1618529 ns 1619194 ns 1.00
array/accumulate/Float32/dims=2L 698100 ns 698530 ns 1.00
array/construct 1267.6 ns 1279.85 ns 0.99
array/random/randn/Float32 42943 ns 47253.5 ns 0.91
array/random/randn!/Float32 24898 ns 24573 ns 1.01
array/random/rand!/Int64 27331 ns 27294 ns 1.00
array/random/rand!/Float32 8769.666666666666 ns 8724.333333333334 ns 1.01
array/random/rand/Int64 29760 ns 29633 ns 1.00
array/random/rand/Float32 12950 ns 12902 ns 1.00
array/permutedims/4d 59908 ns 61250.5 ns 0.98
array/permutedims/2d 53801 ns 54865 ns 0.98
array/permutedims/3d 54901 ns 55511 ns 0.99
array/sorting/1d 2756214.5 ns 2757710 ns 1.00
array/sorting/by 3368533 ns 3344132.5 ns 1.01
array/sorting/2d 1087613 ns 1080389 ns 1.01
cuda/synchronization/stream/auto 1052.5454545454545 ns 1015.8333333333334 ns 1.04
cuda/synchronization/stream/nonblocking 7580.2 ns 7618.9 ns 0.99
cuda/synchronization/stream/blocking 812.78125 ns 799.1530612244898 ns 1.02
cuda/synchronization/context/auto 1181.4 ns 1164.1 ns 1.01
cuda/synchronization/context/nonblocking 8037.299999999999 ns 7651.4 ns 1.05
cuda/synchronization/context/blocking 925.2666666666667 ns 895.8490566037735 ns 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant